Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add optional RNNoise support to AudioBridge #3185

Merged
merged 21 commits into from
Apr 8, 2024
Merged

Conversation

lminiero
Copy link
Member

This PR is an attempt to revive an effort initially contributed by @mirkobrankovic in #2260, by adding an optional and configurable support for RNNoise to the AudioBridge plugin: the idea is to basically add a mechanism to perform noise reduction in audio rooms with the help of the RNNoise library.

This patch is is quite different from Mirko's original contribution, though:

  • First of all, the original patch performed noise reduction on the resulting mix: my patch, instead, has separate denoising contexts for each participant, meaning that each participant may or may not be denoised (it can be changed dynamically). The main reason for that is to improve the end result, since if many speakers are noisy, adding their contributions to the mix sums noise too: denoising participants at the "source", instead, should help clean up the signal before it's added to the mix, at the expense of course of some more CPU usage due to the multiple denoisers.
  • Besides, Mirko's patch had a peculiar "packet skipping" feature, where you could tell the code to only denoise N packets out of M: not sure why it was done that way (the comments suggested the audio would be too robotic otherwise), but I didn't find any note related to that in the search I made for common practices when using RNNoise, so I didn't do that: in my patch, if denoising is enabled, you denoise all packets, so it's a on/off kind of thing.

Apart from that, I added ways to selectively enable/disable the feature. First of all, you can configure a room to enable denoising by default, by setting the denoise property to true: in that case, any participant that joins the room resunts in a denoiser instance created for them, unless they explicitly provide a denoise: false property when joining. A participant that joins a room where denoising is not enabled by default, can instead create a denoiser by joining with a denoise: true property. Denoising can also be enabled/disabled dynamically: participants can do that for themselves via a configure request, while room owners can use the synchronous denoise_enable and denoise_disable requests instead, by specifying the room and the specific participant to impact.

Coming to the actual implementation, it only partly works, due to some constraints in the RNNoise library that I haven't overcome entirely. Specifically, the rnnoise_process_frame function expects a buffer of exactly 480 samples to denoise: it can't be more and it can't be less. Depending on the sampling rate in use in the room, a single audio packet of 20ms received via WebRTC can contain a different number of samples: when using 16000 as a sampling rate, for instance, it will be 320 samples; 480 for 24000; 960 for 48000. This means that, at the moment, denoising works fine when you use a sampling rate of 24000 or 48000 (since can do one or two rounds of denoising with samples that are multiples of 480), while you get audio artifacts when using lower sampling rates instead. At the moment, I'm also getting artifacts when using stereo rooms (spatial audio): I do know that Opus uses interleaving when stereo is used, but even taking that into account (and so performing multiple rounds of denoising on different subsets of 480 samples) the artifacts are still there.

For this reasons, I decided to submit this as a draft pull request: in fact, it kinda works already, but it also definitely needs improvements, so I'm hoping that some fresh pairs of eyes looking at the code and/or people testing and providing feedback may help address what's currently not 100% working instead.

Tagging as multistream since I implemented this in 1.x, but this will be trivial to port to 0.x as well should this be merged eventually.

@lminiero lminiero added the multistream Related to Janus 1.x label Mar 17, 2023
@lminiero lminiero marked this pull request as ready for review May 25, 2023 15:16
@lminiero
Copy link
Member Author

I spent some time investigating what was wrong, and managed to fix the broken audio and artifacts in all combinations of sampling rate, mono and stereo alike. It seems to be working fine to me, now (even disabling/enabling denoising on the fly, where you can actually hear it at work), so I marked the PR as ready for review. That said, I'll obviously wait for more feedback before (if) we merge this, also taking into account the jitter buffer PR would need to be merged first (which means I'll have to rebase this accordingly).

@IlgamGabdullin
Copy link

Hi! Is there a chance that this feature will be in VideoRoom Plugin?

@lminiero
Copy link
Member Author

lminiero commented Sep 7, 2023

Hi! Is there a chance that this feature will be in VideoRoom Plugin?

No, because the VideoRoom plugin only relays packets, it doesn't decode them as the AudioBridge does.

@Odinvt
Copy link

Odinvt commented Sep 8, 2023

Hi! Thanks for this PR!

We were using a frontend version of the same RNNoise with the default model via WASM and it doesn't affect the original voice quality much. (https://github.com/jitsi/rnnoise-wasm)

But, when testing this PR it results in a very "strangled" voice quality compared to the WASM one.

Would you say that this is related to the fact that it's used on the original lossless microphone samples before encoding frontend side, or rather something related to the usage of RNNoise in this PR ?

PS: I could provide 3 simultaneous audio recordings of the original, denoised-frontend, denoised-janus sound in a noisy environment if it helps

Thanks again !

@lminiero
Copy link
Member Author

lminiero commented Sep 8, 2023

Would you say that this is related to the fact that it's used on the original lossless microphone samples before encoding frontend side, or rather something related to the usage of RNNoise in this PR ?

I think it depends on the sampling rate you're using in the AudioBridge. If you use, e.g., 16000, resampling will be performed, and I think it may be that the way we do it causes what you hear. If in a 48000 room the audio is good, and comparable in your tests, that's likely it.

@Odinvt
Copy link

Odinvt commented Sep 8, 2023

Indeed i retested with 24000/48000 room sampling rate and it works perfectly fine. I'll be moving this to production tonight for some 500 daily users (some OPUS/48000 some PCMA/8000 (SIP & PSTN bridged from the SIP plugin to the AudioBridge plugin)) I'll let you guys know how it works out for quality & CPU usage. Thanks again great work 🙏

@Odinvt
Copy link

Odinvt commented Sep 11, 2023

We've seen no significant CPU usage increase with this enabled in production (2 to 4 percentage points increase in average CPU usage across multiple 32 CPU VMs). Although, it does seem to add some "barely noticeable" cumulative delay in audio contributions with denoise enabled (just under the 200ms mark) for hour long conferences.

@atoppi
Copy link
Member

atoppi commented Sep 12, 2023

@Odinvt are you sure the observed delay is due to the denoiser?
We recently merged the jitter buffer in the master branch (now merged in this PR) and that might also explain a bit of latency.
To double-check this, use the Admin API for an handle with the delay and read the values of buffer-in and queue-in.

@Odinvt
Copy link

Odinvt commented Sep 12, 2023

@atoppi Indeed I was on v1.1.4 stable before. I only switched to v1.2.0 because this merge was rebased on top of speexdsp's jitter buffer. So i never got to actually test v1.2.0 separately. I will pay attention to the mentioned values for both denoise enabled and disabled, and get back to you asap. Thanks !

@spscream
Copy link
Contributor

Hi, we are testing your PR. Denoise is working when initially joined with denoise: true. But changes on the fly with new configure request aren't working:

{"body":{"denoise":false,"request":"configure"},"handle_id":8708560238099099,"janus":"message","session_id":6601772272643216,"transaction":"QUrR+WT5jq4"}
{"body":{"denoise":true,"request":"configure"},"handle_id":8708560238099099,"janus":"message","session_id":6601772272643216,"transaction":"+6ZA40wJa3g"}

requests abobe doesn't change anything on the fly. Is anything wrong with them?
I also tried with ice restart, sending new jsep and denoise flag, but it does no effect.

@brave44
Copy link
Contributor

brave44 commented Oct 27, 2023

@lminiero any plans to merge this?

@lminiero
Copy link
Member Author

requests abobe doesn't change anything on the fly

@spscream Does it work when using the synchronous denoise_enable and denoise_disable requests instead?

@brave44 we do plan to merge this, but not sure yet, as we're still waiting for more feedback; besides, we may have to improve how it works when doing 8000/16000 sampling rates.

@brave44
Copy link
Contributor

brave44 commented Oct 27, 2023

@brave44 we do plan to merge this, but not sure yet, as we're still waiting for more feedback; besides, we may have to improve how it works when doing 8000/16000 sampling rates.

Thanks, yeah, for us it works fine with 4800 sampling rate.

@atoppi
Copy link
Member

atoppi commented Nov 16, 2023

@Odinvt @spscream @brave44 the algorithm has been totally refactored, could you please test the last revision?

@Odinvt
Copy link

Odinvt commented Nov 16, 2023

@Odinvt are you sure the observed delay is due to the denoiser? We recently merged the jitter buffer in the master branch (now merged in this PR) and that might also explain a bit of latency. To double-check this, use the Admin API for an handle with the delay and read the values of buffer-in and queue-in.

I apologize for the late reply. The audio delay was due to our "tree" scaling's forwards going through a VXLAN over VPN which encryption was not hardware accelerated.
We've been running this branch with the jitter buffer rebase since September with no issues with 24000/48000 sampling rates in production after successful stress tests.

requests abobe doesn't change anything on the fly

@spscream Does it work when using the synchronous denoise_enable and denoise_disable requests instead?

@lminiero the synchronous requests work fine we've been using them in production also.

@Odinvt @spscream @brave44 the algorithm has been totally refactored, could you please test the last revision?

@atoppi Will do. Thanks !

@spscream
Copy link
Contributor

@atoppi thanks, we will check it

@lminiero
Copy link
Member Author

Can't help on forks, sorry. If you can find if any commit we made is causing the problem, please do let us know.

@spscream
Copy link
Contributor

@lminiero sure, we will do our best.
But fork isn't related to ab, only vr logic affected(some custom fields added to participants state).

@spscream
Copy link
Contributor

@lminiero we checked on master + our commit today and we have no troubles on it. We will check the latest changes on current branch afternoon.

@spscream
Copy link
Contributor

@lminiero moved to latest changes from this branch, no problems now for us.
We will monitor it until tomorrow, looks like everything is fine.

@spscream
Copy link
Contributor

spscream commented Nov 27, 2023

We have some troubles with audiobridge. If publisher have bad network connection we hear crack sounds and it appears on every call with such clients. After change ec52fbb is a bit better, but doesn't remove issue completly. How can we debug it and help to address it?

btw on vr publisher from the same users we don't have these cracks

@spscream
Copy link
Contributor

I tested it using clamsy utility https://jagt.github.io/clumsy/ - cracks starts after adding drops over 5%

@lminiero
Copy link
Member Author

Unless these reports are related to the RNNoise effort, this is the wrong place to talk about it.

@spscream
Copy link
Contributor

Unless these reports are related to the RNNoise effort, this is the wrong place to talk about it.

I created #3297 issue to adress it.

@mail2mhossain
Copy link

We are currently testing your PR. As part of our testing process, we're setting up an audio bridge room with the RNNoise feature activated, by setting the "denoise" option to true.

In certain circumstances, we've noticed an echo effect. This typically happens when using the laptop's built-in speaker and microphone. Interestingly, this issue does not occur when headphones are utilized.

@atoppi
Copy link
Member

atoppi commented Feb 5, 2024

@mail2mhossain that sounds more like an issue with the client environment.

This typically happens when using the laptop's built-in speaker and microphone. Interestingly, this issue does not occur when headphones are utilized

Do your clients have echo cancellation mechanisms?
Typically browsers do and it should be enabled by default if the device supports it.
A quick one-liner test in a browser is the following:

(await navigator.mediaDevices.getUserMedia({ audio: true})).getAudioTracks()[0].getSettings().echoCancellation

@lminiero
Copy link
Member Author

lminiero commented Feb 5, 2024

More importantly, is this indeed a problem with the RNNoise integration? Meaning, does it happen when RRNoise is enabled in the AudioBridge, but doesn't when the AudioBridge doesn't? If not, it's irrelevant to this PR, and as Alessandro said a client side problem (just wear a headset).

@mail2mhossain
Copy link

mail2mhossain commented Feb 6, 2024

It appears that the issue we're encountering, specifically the echo experienced when using the laptop's built-in speaker and microphone, may indeed be related to the absence of RNNoise integration. We are considering integrating RNNoise as a potential solution to address this challenge.

The issue occurs both when RRNoise is activated in the AudioBridge and also when it's not integrated with the AudioBridge.

Our application utilizes a desktop client that incorporates the Sipsorcery WebRTC library, and for audio functionalities, we're leveraging the SIPSorceryMedia.SDL2 library. Based on your feedback, it appears that echo cancellation features are expected to be handled by the client library, in this case, SIPSorceryMedia.SDL2. Consequently, I'm understanding that RNNoise might not support this specific type of noise cancellation, correct?

@mail2mhossain
Copy link

Could RNNoise effectively resolve the echo issue we're facing with the laptop's built-in speaker and microphone? Or, would it be more advisable to incorporate a noise reduction library on the client side?

@atoppi
Copy link
Member

atoppi commented Feb 15, 2024

Could RNNoise effectively resolve the echo issue we're facing with the laptop's built-in speaker and microphone? Or, would it be more advisable to incorporate a noise reduction library on the client side?

Noise suppression and echo suppression are very different processes.

Noise suppression subtracts from a generic stream what is being recognized as background noise and could be performed in both media server (audiobridge with RNNoise in this case) or in the endpoint.
On the other hand echo suppression is a process happening on an endpoint that subtracts from the microphone stream the echo that is bouncing back from the loudpseakers.

@lminiero
Copy link
Member Author

lminiero commented Apr 8, 2024

Merging.

@lminiero lminiero merged commit 2490567 into master Apr 8, 2024
8 checks passed
@lminiero lminiero deleted the rnnoise-audiobridge branch April 8, 2024 16:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
multistream Related to Janus 1.x
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants